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Long-term synaptic plasticity is fundamental to learning and network function. It has 
been studied under various induction protocols and depends on firing rates, membrane 
voltage, and precise timing of action potentials. These protocols show different facets of 
a common underlying mechanism but they are mostly modeled as distinct phenomena. 
Here, we show that all of these different dependencies can be explained from a 
single computational principle. The objective is a sparse distribution of excitatory 
synaptic strength, which may help to reduce metabolic costs associated with synaptic 
transmission. Based on this objective we derive a stochastic gradient ascent learning 
rule which is of differential-Hebbian type. It is formulated in biophysical quantities and 
can be related to current mechanistic theories of synaptic plasticity. The learning rule 
accounts for experimental findings from all major induction protocols and explains a 
classic phenomenon of metaplasticity. Furthermore, our model predicts the existence of 
metaplasticity for spike-timing-dependent plasticity Thus, we provide a theory of long-term 
synaptic plasticity that unifies different induction protocols and provides a connection 
between functional and mechanistic levels of description. 
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INTRODUCTION 

Synaptic long-term plasticity, the bidirectional modification of 
synaptic strength, has a complex dependence on various fac- 
tors. Among them are direct factors such as correlated pre- and 
postsynaptic firing rates (Bliss and Damo, 1973), postsynaptic 
membrane potential (Artola et al, 1990; Artola and Singer, 1993; 
Ngezahayo et al, 2000), precise timing of pre- and postsynaptic 
spikes (Gerstner et al., 1996; Markram, 1997), repetition fre- 
quency of such timing (Sjostrom et al., 2001), synaptic location 
(Froemke et al., 2005; Sjostrom and Hausser, 2006), as well as 
indirect (or meta-) factors such as previous postsynaptic activ- 
ity (Bienenstock et al., 1982; Wang and Wagner, 1999) and initial 
synaptic strength (Ngezahayo et al., 2000). 

Theories and models for many of these factors (especially 
the timing-dependence) have been studied on different levels 
of abstractions including biophysical (Lisman, 1989; Artola and 
Singer, 1993; Shouval et al., 2002), phenomenological (Pfister 
and Gerstner, 2006; Clopath and Gerstner, 2010; El Boustani 
et al., 2012), and functional (Toyoizumi et al., 2005; Sprekeler 
et al., 2007; Pool and Mato, 2011). Nevertheless, synaptic long- 
term plasticity is induced by a complex molecular machinery. 
Depending on the experimental protocol we are only probing dif- 
ferent facets of this process. But so far, most theories only address 
a specific induction protocol and none of the existing theories 
bridges the gap between the biophysical and the functional level. 

Here, we present a unifying theory that describes long-term 
synaptic plasticity from a single objective function based on 
sparseness. We propose that the primary goal of synaptic long- 
term plasticity is a sparse distribution of synaptic strength and 
that synaptic changes induced by STDP and other induction 



protocols can be understood as a consequence of this objec- 
tive. The synaptic learning rule resulting from this functional 
goal is formulated in biophysical quantities and relies only on 
local quantities. It reproduces results from spike-timing-, rate-, 
and voltage-dependent induction protocols and even metaplas- 
ticity, thus providing a unifying account of synaptic long-term 
plasticity. 

While it has been known for years that the distribution of 
excitatory synaptic strengths is highly skewed (Song et al., 2005; 
Loewenstein et al., 2011) and that models of STDP can produce 
such distributions (van Rossum et al., 2000; Leen and Friel, 2012; 
Zheng et al, 2013), here we postulate sparseness as the objective 
of long-term plasticity. We argue that this is beneficial because 
it reduces metabolic costs and increases coding efficiency. Our 
approach suggests that sparseness may play an even more impor- 
tant role for neural coding than previously thought (Olshausen 
and Field, 1996; Hromadka et al, 2008). 

MODEL 

This work rests on the hypothesis that synaptic long-term plas- 
ticity is actively maximizing the sparseness of the distribution of 
synaptic efficacies. While this goal appears rather simple, achiev- 
ing it is not trivial because a single synapse is severely constrained 
by its computational abilities and what signals are locally avail- 
able. The aim is a local, computationally simple, and biologically 
plausible learning rule which explains the experimental data 
and is consistent with current mechanistic accounts of synaptic 
plasticity. To derive such a plasticity rule, constraints and approx- 
imations have to be considered which the synapse might be using. 
In the following, the assumptions and the main idea for the 
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derivation is sketched, while more details can be found in the 
Methods section and the Appendix. 

PROXY AND SPARSENESS MEASURE 

The goal is a plasticity rule which, when applied locally at every 
synapse, increases the sparseness of the overall distribution of 
synaptic efficacies. This requires every single synapse to have 
a way of estimating this sparseness without having knowledge 
about the full distribution itself. It needs a proxy to gather infor- 
mation about the strengths of other synapses. The derivations in 
this work will be built on a simple, but reasonable candidate: the 
membrane potential u. The idea is that the membrane poten- 
tial can provide the required information throughout the whole 
dendritic tree. Furthermore, sparseness measures like skewness 
and excess kurtosis for the distribution of synaptic efficacies can 
be assumed to be approximately linear in these measures for the 
membrane potential as argued in the next section. 

We assume that the limited computational capacity provided 
by the biophysical mechanisms in the neuron is not sufficient 
to directly compute the complex dependence of the membrane 
potential distribution on the synaptic inputs. A simple approxi- 
mation in terms of a first order expansion of the voltage dynamics 
makes the dependence explicit. The time course of the membrane 
potential is assumed to be linear with additive Gaussian channel 
noise (Steinmetz et al., 2000) over a short time period 8. Thus, we 
assume a Wiener process X§ with a drift term w(fn) and the noise 
strength a: 



u{t 0 + 8) u(f 0 ) + w(f 0 )8 + aX%. 



(1) 



The conditional probability density over this time interval is 
found to be Gaussian as 



the finite values of the EPSCs and just describe the synapses as 
active or inactive. Thus, at every instant in time a small number n 
of all synapses is "active" meaning a presynaptic spike has arrived. 
The total synaptic current can be compared to the sum of synaptic 
efficacies as I tot ~ J^ke active w k- 

By neglecting correlations between different inputs, the total 
synaptic current becomes equivalent to the sum of iid samples 
drawn from the distribution of synaptic efficacies W. 

For a given state in phase space (u, ii), the distribution of 
the membrane potential depends only on the weights Wk of the 
"active" inputs due to the first-order expansion in (1). It follows 
from basic statistics that the skewness of the membrane poten- 
tial is linear in the skewness of W, since the strengths Wk can be 
regarded as iid samples from W. In the long-term average, the 
skewness values are related as S u oc Here, n is the average 
number of simultaneously active inputs. 

Thus, in order to maximize the sparseness of synaptic strength, 
the aim of the learning rule at each synapse is to maximize the 
skewness of the membrane potential which is taken as a proxy. 
It is defined as the third normalized moment of the mean-free 
membrane potential u = u — u: 



(» 3 ) 



(3) 



STOCHASTIC GRADIENT ASCENT AND MOMENTS 

The objective function in form of the skewness is maximized via 
gradient ascent. That means, the derivative of the objective with 
respect to the synaptic efficacy is used in the learning rule: 



dSu 
dw 



(4) 



p s ( M |fo + f) = Af(u; [i(t),o 2 t), 



with [i(t) = m(<o) + tii(to). Apart from voltage-gated ion chan- 
nels and passive conductances, the velocity of the membrane 
potential u depends linearly on the total synaptic current and 
thereby on the distribution of synaptic weights. 

Next, the measure for the sparseness needs to be specified. 
Straightforward choices are the statistical measures of skewness S 
or kurtosis K, which are frequently used in learning algorithms 
such as independent component analysis (Hyvarinen and Oja, 
2000). They have also been used to derive a BCM-type of learn- 
ing rule (Blais et al., 1998). In our approach, both measures lead 
to a very similar learning rule. We will focus on skewness in 
the following. To calculate these measures the probability density 
function, or at least its moments, need to be estimated. 

SKEWNESS RELATION BETWEEN MEMBRANE POTENTIAL AND 
SYNAPTIC EFFICACIES 

Given the proxy variable u, we will argue that, in a rough approx- 
imation, the skewness of the membrane potential is proportional 
to the skewness of the synaptic efficacies. The objective, thus, 
reduces to maximizing the skewness of the membrane potential. 

Comparing the short duration of the EPSC of about 20-100 ms 
to the low average firing rate of cortical neurons, we could neglect 



(2) The weight follows this gradient and evolves such that the skew- 
ness increases. 

Using the full expression of the skewness has two implausi- 
ble requirements: first, the synapse needs to store information 
about the membrane potential distribution across time and sec- 
ond, it must perform the derivative on this complex expression. 
A stochastic gradient ascent avoids both problems. Here, the 
synapse does not estimate the distribution and expectation val- 
ues of u over an extended period of time. Rather, the expectation 
values are calculated for a very short time interval 8 during which 
we can linearize the dynamics [see Equation (1)]. 

The weight follows the gradient of a stochastic instantaneous 
"sample" of the skewness: 



(5) 



On a short time scale, the weight will fluctuate seemingly at ran- 
dom while it performs an optimization very local in time. The 
synapse approximates the complex and unkown skewness land- 
scape and follows the gradient of this approximation. On a longer 
time scale these flucuations average to an overall change which 
optimizes the full skewness expression. However, this greedy opti- 
mization can potentially get stuck in local optima. Thus, with 
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the introduction of a proxy and the optimization via a stochastic 
gradient ascent, the original problem of skewness maximization 
for the weight distribution has been converted from being global 
in time and space to a learning rule being local in time and 
space. 

The skewness is measured relative to the mean membrane 
potential u which defines the reference point for the distribu- 
tion and its moments. The requirement for the stochastic gradient 
ascent to converge is that u is not averaged within each "sample" 
bin. It needs to be estimated over a longer period of time: 



"= (">x s »8- 



(6) 



This long-term averaging of u is important. We will take u to be a 
low-pass filtered version of the voltage dynamics and compute it 
via an exponentially weighted average: 



du 
dt 



u — u 



(7) 



To summarize the introduced approximations: the sparseness of 
the distribution of synaptic efficacies, measured with the normal- 
ized higher-order moment of skewness, is estimated using the 
membrane potential as a proxy. For biological plausibility, the 
sparseness of the membrane potential is maximized in a compu- 
tationally simple way by applying a stochastic gradient ascent. The 
stochastic "samples" are averaged over a very small time interval 
during which the dynamics of the voltage can be approximated by 
a first-order expansion. The mean membrane potential, however, 
is averaged over an extended time period. 

Calculating the "sample" skewness from (5) amounts to evalu- 
ating the moments of the proxy variable over 8. The moments of 
the mean-free membrane potential ti = u — u can be done ana- 
lytically due to the Wiener approximation from (2). The n-th 
moment is found as: 



[u\ = jdu^j dt p(u\t + t 0 ) (u - uf 
= dt j du p(u\t + t 0 ) (u - u) n 
= ^ j dt j duM (u; u,(f) - u, a 2 t) «", 



(8) 



=M„ 



where M„ is just the n-th raw moment of a Gaussian, since the 
order of integration can be interchanged according to Fubini's 
theorem. The moments [u") h follow as 



[u\ = u 2 + ^ (cr 2 + 2ww) 8 + ^w 2 8 2 



(9) 



( M -} 8 = v? + - (ua 2 + u 2 u) 8 + (ua 2 + uii 2 ) S 2 + O (S 3 ) (10) 
) 8 = u 4 + (3uV + 2u i u) 8 + (a 4 + 4uiio 2 + 2u 2 u 2 ) 8 2 



+ O (8 3 ) . 



LEARNING RULE 

The synaptic efficacy (or weight) w is a single number character- 
izing the strength of the synapse. It is the result of a combina- 
tion of different factors, e.g., released neurotransmitter, number 
of receptors, and single channel conductance of the receptors. 
Thus, the weight is a function of different factors xf. w = 
w(x\, . . . , x m ). These are the final variables inducing plasticity 
which, in order to maximize the skewness, follow the differential 
equation 



Xj := T) 



>1 



dxi 

dw du 8S* U 
1 dxi dw 3m ' 



(12) 



(13) 



with learning rates r|;. The gradient decomposes into three terms 
since the skewness only depends on u which depends on w which 
in turn depends on x;. 

The first term 

|^ depends on how the weight and its regulating mecha- 
nisms/variables are modeled. We will restrict ourselves to the 
common postsynaptic factors of number and single channel con- 
ductance of the receptors. For a glutamatergic synapse, there are 
two main types _R of receptors (NMDA and AMPA) each having 
an average maximal single channel conductance gR and a total 
number Nr. The synaptic current of a given receptor type R is 
given by 



Mt) = N Rg R g 0 K (t) (Er - u) . 



(14) 



with the resulting weight wr, the normalized current of a single 
receptor I R 0 ^ (t) and the reversal potential Er. 

Each quantity will be optimized simultaneously with individ- 
ual learning rates: 



N R := T| N 



dw R du dS* u 
3Nr 3wr du 



*\NgR £2 



dw R du dS* u 
gR ■= - — tt = T] g N R £2 
dgR dWR du 



(15) 



(16) 



The common expression U = ^ does not depend on N and 
g. The coupling of the differential equations depends on the learn- 
ing rates r\ s and r^. They determine the behavior of the weight 
change 



wr =gRN R +N R gR. 



(17) 



Here, the learning rates in all simulations were set such that the 
weight change was multiplicative: 



WR = 2^/T\ g -X) N WR£l. 



(18) 



(11) 



An additive learning rule can be obtained with a different set- 
ting of the learning rates but leads to qualitatively similar results. 
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The derivations for the dependence of wr on the learning rates 
is described in the Methods: Derivation of the learning rule in 
Appendix. 

The second term 

is simply linear in the input current for a standard 
conductance-based neuron model. The derivative of the mem- 
brane potential depends linearly on the synaptic current 
hyn(t) = ER w fi4 0) (0 and thus 



3m 

3wr 



C ' 



(19) 



where 7^ 0) (f) is the single channel current through a receptor of 
type R and C is the membrane capacitance. While the current 
through a single channel ijj is a quantity that can not be deter- 
mined by a synapse, the overall plasticity rule depends on the total 
synaptic current for the multiplicative case from (18). 

The third term 

Establishes the connection to the sparseness measure and is there- 
fore the one which is fundamental to the approach. Using the 
identities from (9) to (11), we take the derivative of S* with 
respect to u and expand in orders of the small time interval 8: 



Here, all remaining constant factors and parameters were com- 
bined into one parameter: the learning rate r\ = 2^/r\ g \]M 8 2 now 
has the dimension of seconds. 

The differential equations exhibit a strongly non-linear depen- 
dence on the mean-free membrane potential u. In both cases the 
functional form of this dependence is similar. If u is very small, the 
noise parameter a dominates the numerator and the expression 
is negative. For a positive u, the numerator will be zero for some 
intermediate value of u and become positive for larger values. The 
whole expression, dominated by the denominator, approaches 
zero again for sufficiently large depolarizations. This dependence 
of the plasticity on the membrane potential resembles the exper- 
imental findings on voltage-dependent plasticity as we will show 
in the next section. 

The other important functional dependence is on the corre- 
lation between the time derivative of the membrane potential 
u and the synaptic input current Ir. Neglecting the small noise 
parameter cr, the plasticity rule is proportional to their product as 



wr oc uIr. 



(25) 



Thus, it belongs to the class of so-called differential Hebbian rules 
where the weight change depends on the correlation between 
presynaptic activity and the derivative of postsynaptic activity 
(Kosko, 1986). 



dil 



(I 



2 )\u 3 \ +0(b) 



+ 0(8) 



8 2 . 



(20) 



All terms in S* which are of order 8° do not depend on u and 
the terms of order 8 1 cancel out exactly. Thus, we are left only 
with terms of order 8 2 or higher. The overall factor 8 2 can be 
absorbed into the learning rate and will be neglected. For u —> 0 
the higher-order terms in 8 can not be neglected. Instead of using 
the full gradient expression, we introduced a simple regulariza- 
tion parameter y. It is a constant phenomenological parameter 
which represents the higher-order terms and thereby prevents the 
expression from diverging: 



dii 



1 (uu 
4 



2a 2 ) 



u 6 + y 6 
a»y 1 hit — 2a 2 ( 

' 4 \¥\ ' 



(21) 
(22) 



DIFFERENTIAL HEBBIAN LEARNING 

The final plasticity rule with all three terms combined, for skew- 
ness as the objective function, is 



wr := r| 



1 I R (t) (uu - 2a 2 ) 



4 C 



u 6 + y 6 



(23) 



Using kurtosis as the sparseness measure results in the similar 
expression 



wr := r| 



2I R (t) (uu - 1.5a 2 ) i 



3 C 



tr + y 



(24) 



RESULTS 

WEIGHT DISTRIBUTION AND SKEWNESS 

The learning rule was derived from the goal of maximizing skew- 
ness. Therefore, we assessed the effect of the learning rule on the 
distribution of synaptic weights and its skewness. 100 synapses 
located at 200 (Jim from the soma received independent presy- 
naptic Poisson spike trains at 3 Hz. The weight distribution was 
estimated over the population of 100 synapses for several trials. In 
each trial every synaptic efficacy was initially set to wq = 0.16 mV 
and new random spike trains were sampled over the given dura- 
tion (see Figure 1). Since we did not employ any saturation effects 
or hard bounds, some synapses grew strong enough to trigger 
a postsynaptic spike. This spike strongly biased the measured 
strength of these synapses (see Methods). We, therefore, excluded 
all synapses with a strength larger than 10 mV from our skewness 
analysis to remove these non-linear effects. 

The simulations were done with our standard multiplica- 
tive weight change and also for the additive weight change (see 
Methods and Appendix). The reason for this was that any mul- 
tiplicative weight change following some distribution ultimately 
leads to a lognormal distribution according to the central limit 
theorem and Gibrat's law. And such a lognormal distribution will 
always exhibit a positive skewness. Thus, the effect of the learning 
rule might not be distinguishable from random weight changes. 
On the other hand, for an additive random weight change fol- 
lowing some distribution the synaptic efficacies would become 
Gaussian due to the central limit theorem. A lognormal distri- 
bution even for additive changes shows that the objective of an 
increased skewness is actually achieved by the learning rule. 

Figure 1 shows the cumulative density function of synaptic 
weights for two different simulation durations (T = 30 s, 300 s) 
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Synaptic weight (mV) 



FIGURE 1 | Cumulative density function of synaptic weights for 
additive and multiplicative learning rule. The weights for the 100 
synapses are initialized at wq = 0.16 mV and shown for two different 
simulation durations (upper and lower panel). The cumulative densities for 
multiplicative and additive weight changes are fitted with a lognormal 
(straight and dashed line). Additionally, a Gaussian (dotted line) is fitted to 
the additive weight changes. 



in log-scale. The cumulative density function, rather than the 
probability density function, was chosen to eliminate statistical 
problems with the choice of the histogram bin sizes. The weights 
from the multiplicative learning rule are well fitted by a lognormal 
distribution for both durations as expected. But also the additive 
learning rule leads to a cumulative density function which fol- 
lows a lognormal distribution. For the short simulation of 30 s 
(upper panel) the Gaussian fit is only slightly worse in terms of 
the root mean square of the residuals (RMSR) (RMSR\ osnorm!l i = 
0.0209, .RMS^Gaussian = 0.0238), but it becomes worse as the 
weight change progresses. The weights at 300 s show a slight 
deviation from the lognormal cumulative density function at 
low values while the Gaussian does not fit the data very well 
0.0227, .RMS£ Gaussian = 0.0814). 
Figure 2 shows the skewness of the weight distribution as a 
function of simulation duration. Both types of learning rules 
show an increase of skewness from the initial delta peak, which 
is not skewed, to values of around 4 after 5 min of simulation. 
These values are comparable to the value of 5.2 extracted from a 
lognormal fit to experimental data (Song et al., 2005). 

STDP: SPIKE PAIRINGS 

We tested if our approach can explain the basic phenomenon 
of STDP. The main protocol for studying STDP is by pairing 
pre- and postsynaptic action potentials at different time delays 
A at low repetition frequency. For a negative delay the postsy- 
naptic spike precedes the presynaptic spike (post-pre) and for 




1000 1500 2000 
Duration (s) 



3000 



FIGURE 2 | Skewness of the synaptic weight distribution over 
simulation duration. The horizontal dashed lines correspond to the value 
calculated from a lognormal fit to experimental data (Song et al., 2005). 
Results shown are calculated for one population of 100 synapses with the 
mean and variance of skewness averaged over several trials. 
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FIGURE 3 | Relative change of the synaptic strength in an STDP pairing 
protocol. The experimental data (Bi and Poo, 1998; Zhang et al., 1998) 
(triangles) show LTD for post-pre pairs and LTP for pre-post pairs with 
delays up to 30-40 ms. The results of our model are shown for two different 
synaptic locations on the apical dendrite: 200 |xm (solid line) and 400 |xm 
(dashed line) from the soma. The timescale of the LTD window increases 
with distance while the LTP window shows no dependence on the location. 



a positive delay the order is reversed (pre-post). We simulated 
7 spike pairings at 1 Hz for two different synapses at 200 u,m 
(wampa = 200 pS) and 400 u,m (wampa = 250 pS) distance from 
the soma. The results were in good agreement with the experi- 
mental data (Bi and Poo, 1998; Zhang et al, 1998) apart from the 
LTP at very short delays, which is predicted too strong (Figure 3). 

It should be noted that the 7 spike pairings we have used are 
not comparable to the more than 50 pairings which are usually 
required in experiments to achieve these amounts of plasticity. 
One possible explanation is the learning rate which we kept con- 
stant for all simulations in this manuscript in order to make 
consistent comparisons. But one could consider different learning 
rates for different cell types which would lead to different num- 
ber of spike pairings required. However, the question how many 
pairings are required is not unimportant. On the one hand, using 
more spike pairings would lead to much more plasticity in our 
simulations, which would be problematic especially for very short 
delays. On the other hand, it is also questionable that plasticity 
can be generated with such few pairings applied. For these cases 
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to be handled realistically we would need more detailed mecha- 
nisms for the expression and maintenance of plasticity which we 
did not include so far. 

Interestingly, the STDP window depended on the synaptic 
location. This is related to the broadening of the backpropagat- 
ing action potential (bAP) along the dendritic tree. The change 
in the timescale of the bAP decay affects the interaction of 
post-pre pairs. The width of the LTD window in our model is 
therefore increasing with distance from the soma, while the LTP 
window is relatively independent of the synaptic location. We 
show this by way of example for two synapses at 200 |xm and 
400 u,m (Figure 3). This feature of our model was also found 
experimentally (Froemke et al., 2005). 

STDP: FREQUENCY DEPENDENCE 

The standard STDP protocol consists of spike pairings repeated 
at low frequencies. Increasing this repetition frequency leads to 
interactions between spikes of different pairs. The LTP and LTD of 
these pairings do not add linearly as shown by experimental find- 
ings (Sjostrom et al, 2001). Figure 4 shows our simulation results 
for 100 pairings at positive (+10ms) and negative (—10ms) delays 
as a function of the repetition frequency compared to the exper- 
imental data. The modeled synapse was located proximally at 
150 [im and only contained NMDA receptors (wampa = 0 pS). 
This is comparable to the experimental conditions where the 
measured excitatory postsynaptic potential only showed one slow 
component with decay times of about 50 ms (Sjostrom et al., 
2001). At low frequencies below 5 Hz, the model predicted LTD 
for negative pairings but hardly any LTP for positive pairings. 

The vanishing LTP at low pairing frequencies might seem to 
be in conflict to the results of Figure 3, but we retrieved the miss- 
ing LTP with more distal synapses containing AMPA receptors. 
Interestingly, this is in line with results from Pawlak and Kerr 
(2008) showing a non-zero LTP for very low pairing frequen- 
cies (< 0.1 Hz). Their results were based on synapses containing 




0 10 20 30 40 50 

Repetition frequency (Hz) 



FIGURE 4 | Increasing the repetition frequency of the STDP pairing 
leads to non-linear effects. Delays were fixed at +10 ms (squares and 
solid line) or —10 ms (triangles and dashed line). The experimental data 
(Sjostrom et al., 2001 ) show LTP for pre-post at low frequencies up to 20 Hz 
which increases with frequency, while post-pre pairing leads to LTD with 
roughly constant amplitude. At frequencies above 30 Hz LTP for pre-post 
starts to saturate, but post-pre pairings exhibit a crossover from LTD to LTP 
At 50 Hz both result in strong LTP since pre-post and post-pre protocols 
become equivalent for A = ±10 ms. 



mainly AMPA receptors as demonstrated by the AMPA/kainate 
blocker CNQX, as compared to the setup by Sjostrom et al. (2001) 
with a dominant NMDA contribution to the EPSC. 

LTP for positive pairings increased with frequency and the LTD 
for negative pairings was converted to LTP for frequencies above 
30 Hz. At 50 Hz, where both protocols became equivalent, robust 
LTP was predicted by the model matching the experimental data 
(Sjostrom et al., 2001) (Figure 4). Thus, our model can account 
for the non-linear integration of LTP and LTD when looking 
beyond isolated spike pairs. 

Like for the standard STDP protocol, we observed a depen- 
dence on the synaptic location for this frequency-dependent 
STDP protocol. The robust potentiation at 50 Hz for ±10 ms pair- 
ings was only measured at proximal synapses. It converted to 
a slight depression for very distal locations more than 700 |xm 
from the soma. We observed this LTP-LTD switch for NMDA- 
only synapses as well as for synapses with a distance-dependent 
increase in AMPA receptors (Figure 5). These results of our 
model are qualitatively similar to experimental findings (Sjostrom 
and Hausser, 2006). 

RATE-DEPENDENT PLASTICITY AND METAPLASTICITY 

Depression and potentiation can also be induced without a con- 
trolled timing of pre- and postsynaptic spikes. We simulated 
this rate-dependent plasticity by independent pre- and postsy- 
naptic Poisson spike trains lasting 30 s. The simulated synapse 
was the same as in the STDP pairing protocol (200 [im dis- 
tance from the soma with wampa = 200 pS). The presynaptic 
frequency was fixed at 10 Hz and we probed the induced plas- 
ticity as a function of the postsynaptic conditioning frequency. 
Our model predicted LTD at low conditioning frequencies and 
LTP above a given frequency threshold (Figure 6). Furthermore, 
increasing the presynaptic frequency increased the amplitude of 
the observed change in synaptic strength (not shown). These 
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FIGURE 5 | STDP pairings with a delay of +10 ms at 50 Hz for different 
synaptic locations. Proximal synapses (<200 u.m) show strong LTP In the 
case of no AMPA receptors (dashed line), the potentiation decreases with 
distance starting at 200 |xm and eventually converts to LTD above 700 \im. 
Increasing the number of AMPA receptors with distance (solid line) 
abolishes the potentiation and converts it to LTD. Proximal synapses with a 
distance to the soma smaller than 200 |im were assumed to be dominated 
by NMDA receptors and did not contain any AMPA receptors (see 
Methods). The experimental data (triangles) show a similar switch from LTP 
to LTD as a function of distance (Sjostrom and Hausser, 2006). 
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FIGURE 6 | Activity-dependent metaplasticity in accordance with the 
BCM theory. The experimental data (Wang and Wagner, 1999) in the 
control conditions (filled squares) reveal slight LTD at low conditioning 
frequency (<5 Hz) and strong LTP above. After priming the postsynaptic cell 
with a 100 Hz stimulus (empty squares) the LTD window becomes larger 
(up to 30 Hz) and stronger (amplitude doubles). In the model the control 
case (filled circles) corresponds to an average postsynaptic activity history 
of 10 Hz, while in the priming case (empty circles) the neuron was 
stimulated with 100 Hz for 3 s before the conditioning. Upon priming the 
model displays the same leftward shift of LTD/LTP crossover point and an 
increase in LTD amplitude. 



results agree with the predictions of the BCM theory (Bienenstock 
et al., 1982) and also fit the data of its experimental verification 
(Wang and Wagner, 1999). 

Another prediction of the BCM theory is the so-called meta- 
plasticity, where the history of postsynaptic activity influences the 
observed plasticity. To simulate an increased postsynaptic activ- 
ity the cell is primed with a 100 Hz stimulus (Wang and Wagner, 
1999) which we applied for 3 s prior to the conditioning. The high 
postsynaptic activity increased the average membrane potential u 
at the synapse due to the bAP. This led to a larger amount of LTD 
(Figure 6) as found experimentally (Wang and Wagner, 1999). 
This homeostatic effect is of heterosynaptic nature, since the bAP 
influences u at all synapses. 

So far, metaplasticity has only been studied for the above men- 
tioned rate-dependent protocol. The level of abstraction used in 
our model does not allow us to make novel predictions in a quan- 
titative way. But combining STDP with a priming protocol for 
metaplasticity results in a clear qualitative prediction. Our model 
predicts that the width of the LTD window in an STDP proto- 
col will be smaller if it is preceded by strong priming (Figure 7). 
This prediction is a fundamental consequence of our learning 
rule given the influence of the priming, namely raising the mean 
membrane potential li. It is also not depending on the choice 
of skewness as the sparseness measure, but will be valid also for 
kurtosis or any other higher moment. 

VOLTAGE-DEPENDENT PLASTICITY 

Finally, we tested if our model can also explain results from an 
induction protocol based on clamping postsynaptic membrane 
voltage. To this end, we paired a presynaptic stimulation with 
postsynaptic depolarization. We clamped the soma of the post- 
synaptic cell at different voltages. A distal synapse (700 (Jim) with 
only NMDA receptors was stimulated 100 times at a frequency 
of 2 Hz following the protocol from Ngezahayo et al. (2000). 
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FIGURE 7 | Shape of the STDP pairing window depends on history 
of activity. Priming the postsynaptic cell with a 100 Hz stimulus 
(dashed line) alters the amplitude of LTD and LTP slightly. Most 
notably, the priming decreases the width of the LTD window by a 
factor of two: LTD is not induced for post-pre pairs with delays greater 
than ~10ms compared ~20ms under control conditions (solid line). 
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FIGURE 8 | Plasticity depends on postsynaptic depolarization. 

Simulation of a distal synapse (700 u,m from soma) with only NMDA 
receptors. Pairing presynaptic spikes with postsynaptic voltage clamping at 
the soma leads to no change for slight depolarization, LTD for intermediate 
and LTP for strong depolarization (solid line). We scaled the parameter a 
linearly with the initial synaptic strength for the depressed and potentiated 
synapses (see Text). Depressed synapses (dashed line) have a smaller LTD 
window and potentiated synapses (dotted line) a larger one. 



No change was induced when clamping the membrane voltage 
around the resting potential, while low depolarization led to LTD 
and higher depolarization resulted in LTP (Figure 8). 

The general trend of our results is in line with experimental 
findings (Artola et al., 1990; Feldman, 2000; Ngezahayo et al., 
2000). But the amount of plasticity was very sensitive to the 
synaptic location. In the current formulation of our model, plas- 
ticity can only be generated for synapses which are either far from 
the clamping position and/or have a spine with a small isolating 
neck. There, the influence of the voltage clamp is weaker and u 
is not absolutely zero but can slightly fluctuate. Also the effect of 
AMPA receptors in the synapse was quite pronounced. Increasing 
the number of AMPA receptors led to more LTP (not shown). 

In these voltage clamp simulations we also looked at the influ- 
ence of the phenomenological parameter cr which we had taken 
as fixed so far. a was introduced to account for the influence 
of noise of the voltage dynamics on its sparseness. This noise is 
partially originating from the stochasticity of ion channels and 
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synaptic receptors. We therefore scaled 0 linearly with the synap- 
tic strength. Interestingly, this homosynaptic scaling of a had a 
homeostatic effect. The LTD window of the depressed synapse 
was decreased, while it increased for the potentiated synapse 
(Figure 8). A similar shift in the voltage-dependent plasticity was 
observed experimentally and also shown to be of homosynaptic 
nature (Ngezahayo et al., 2000). 

DISCUSSION 

The phenomenon of synaptic long-term plasticity is driven by 
a complex mechanism and the diverse experimental induction 
protocols are only probing different aspects of this process. 
We showed that induction of spike-timing-, rate-, and voltage- 
dependent plasticity as well as metaplasticity all follow from 
the single computational idea of a sparse distribution of synap- 
tic strengths. Our approach has an intrinsic dependence on the 
synaptic location which agrees with the experimental findings. We 
therefore propose the maximization of sparseness as a central goal 
of synaptic plasticity. Our model also leads to a novel prediction 
of metaplasticity for STDP. 

We provide a two-fold unification compared to existing mod- 
els of synaptic plasticity. Detailed models (Lisman, 1989; Artola 
and Singer, 1993; Shouval et al., 2002) addressing the biophysi- 
cal mechanisms can not make conclusions regarding functional 
goals or consequences. Previous functional models (Toyoizumi 
et al, 2005; Sprekeler et al., 2007; Pool and Mato, 2011), how- 
ever, can not be connected to the biophysical mechanisms, since 
their learning rule is formulated in abstract ways. Also they have 
only addressed one specific induction protocol, namely STDP for 
spike pairs. In contrast, our functional model explains the find- 
ings from different induction protocols as emerging from one 
learning rule and it provides a connection to the underlying bio- 
physics by proposing a concrete, biologically plausible learning 
rule. Our approach bears some similiarity to the independently 
developed model by Yger and Harris (2013) which also follows 
the objective of increasing skewness of the membrane potential, 
however, without reference to the skewness of the weight dis- 
tribution. While their focus is more on unsupervised learning 
in a neural network, our focus lies with a biologically tractable 
learning rule and its effects in a fully morphological neuron. 
In contrast to the additionally required firing rate constraint by 
Yger and Harris (2013), our learning rule intrinsically contains a 
homeostatic metaplasticity to regulate the activity. 

WEIGHT DISTRIBUTION AND DENDRITIC PROPERTIES 

We have shown that the learning rule is capable of increasing 
skewness and can even reach values which are comparable to 
experiments (Figure 1). However, it is not directly clear where 
the skewness would saturate. On the one hand, we would require 
saturation mechanisms for the synaptic weight to prevent some 
weights from becoming too strong as already explained in the 
results section. On the other hand, we have used a very sim- 
plified setting, with only 100 synapses at equal distance from 
the soma, just to verify the increase in skewness. In order to 
actually assess the distribution in a realistic scenario, one would 
need to have many more synapses distributed along the dendritic 
tree with random initial strength. Also the input would need to 



be structured instead of random and could be provided within 
a network simulation. Ultimately, the setup would depend on 
the neuron type and brain region one would like to simulate. 
Assessing such scenarios is beyond the scope of this paper and 
it is not clear what the actual distribution of synaptic strength 
would look like. However, we suspect that the skewness would be 
increased by our learning rule irrespective of the exact scenario. 

If the increase in skewness would be a global one or if the 
optimization would be carried out locally in each branch of the 
dendritic tree, is an interesting question. This would depend on 
the geometry and the type of the dendrite, i.e., active or passive. 
For instance, considering active dendrites (London and Hausser, 
2005) would affect the distance-dependent differences in synap- 
tic weight: it improves the transmission of the bAP and can also 
boost the input of distal synapses. Furthermore, with the genera- 
tion of local or even global dendritic spikes the plasticity is far less 
dependent on somatic activity and the activity in one branch of 
the dendrite can influence the plasticity in connected branches. 

PLASTICITY PROTOCOLS 

A direct consequence of our differential-Hebbian learning rule 
is the asymmetric shape of the STDP window. Under condi- 
tions of an STDP protocol the time derivative of the membrane 
potential u will mainly depend on the bAP from the soma. Thus, 
plasticity is driven by an interaction of bAP and excitatory post- 
synaptic current (Markram, 1997). The location dependence of 
the STDP window in our model is due to the broadening of the 
bAP along the dendritic tree. For a pre-post pairing the synap- 
tic current mainly coincides with the rise of the bAP where u is 
positive, while in the post-pre case it only overlaps with the decay 
where u is negative. Such differential-Hebbian learning rules for 
STDP have already been proposed by Saudargiene et al. (2004). 
Their model was motivated by reinforcement learning algorithms 
(Kolodziejski et al, 2009), but they did not reproduce other 
plasticity phenomena than STDP pairing. A purely phenomeno- 
logical model by Albers et al. (2013), also based on a differential- 
hebbian learning rule, is able to fit a range of experimental results 
like triplets and bursting. Our approach provides a functional jus- 
tification for the use of differential Hebbian rules in STDP and 
reinforcement learning, since this type of learning arises naturally 
from the computational goal of sparse synaptic strength. 

An interesting point to speculate about are the observed anti- 
Hebbian STDP windows (Fino et al, 2005; Letzkus et al., 2006) 
where LTP is induced with post-pre spikes and LTD with pre-post 
spikes. An obvious idea would be to propose a minimization of 
the sparseness, instead of the maximization, which would intro- 
duce a minus sign in the overall learning and, thereby, identically 
reverse the induced plasticity. This would raise the question why 
some neurons would try to optimize the opposite objective func- 
tion. However, minimizing the sparseness might not be so strange 
as it first seems. It would lead to a distribution where most of 
the synapses are stronger and only a few would be weak. These 
weak connections can be neglected leaving many synapses with a 
very similar strength. Together with the homeostatic metaplastic- 
ity regulating the average strength, the cell would be driven by the 
equally weighted inputs of many synapses. In contrast, a cell max- 
imizing the sparseness is driven by only a few very strong inputs. 
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Previous models have proposed a specific triplet interaction 
for STDP (Pfister and Gerstner, 2006) to explain the results for the 
different repetition frequencies (Figure 4). We do not need any 
additional mechanism in our model to account for these findings. 
The robust LTP at high frequencies is due to different timescales of 
LTP and LTD in our model. Induction of LTP is short but strong 
due to the rising phase of the bAP, while induction of LTD is 
weaker but develops over a longer time. Thus, LTD needs more 
time to build up but its induction is cut off at high frequencies by 
the bAP of the next pairing repetition. So the contributions for 
potentiation will dominate the depression and effectively result 
in LTP. The vanishing of LTP at low frequencies arises due to 
the kinetics of the NMDA receptors. For a pre-post pairing the 
synapse is near the resting potential on the arrival of the presy- 
naptic spike. Then the NMDA channels are not fully relieved 
from their Mg 2+ -block at the bAP arrival and the synaptic cur- 
rent is low. At increasing frequencies the bAP from the previous 
pairing will provide increasing depolarization. This will gradually 
relieve the Mg 2+ -block and lead to stronger synaptic currents. For 
a post-pre pairing on the other hand, the cell is still depolarized 
from the bAP. Then the NMDA channel is already unblocked at 
low frequencies. 

The concept of metaplasticity from the BCM theory 
(Bienenstock et al., 1982) has a direct analog in our model and 
does not have to be put in explicitly. For BCM, the firing rate 
threshold regulating the LTD-LTP crossover is a function of the 
recent postsynaptic activation in order to maintain a homeostatic 
balance. In our learning rule, the average membrane potential u 
naturally accounts for this regulation. 

The induction of potentiation and depression fits qualitatively 
with the results from voltage clamp experiments. But the plastic- 
ity in our learning rule is linear in u which is zero at the location 
of the clamp. Only synapses which are sufficiently far, where the 
clamp effect is weaker, can experience a visible amount of plas- 
ticity. We hope that this shortcoming of our model in the current 
formulation can be resolved by extending it with the calcium con- 
centration as a proxy for the distribution of synaptic strength (see 
Future work). 

SPARSENESS OBJECTIVE 

We have postulated that the goal of synaptic plasticity is the 
maximization of sparseness of the synaptic weights. There exist 
many different ideas regarding the advantages of sparseness and 
sparse coding [see Olshausen and Field (2004) for a review]. 
Maybe the most appealing interpretation is in terms of energy 
efficiency. The brain needs to perform its computations with a 
limited energy budget. On the level of a single neuron it tries 
to maximize the information transmission for some average fir- 
ing rate (Attwell and Laughlin, 2001). Thus, sparse coding can 
be interpreted as a mechanism for energy-efficient coding given 
that the generation of action potentials is metabolically costly. 
But the postsynaptic effects of a presynaptic action potential also 
induce metabolic costs (Attwell and Laughlin, 2001). An exci- 
tatory postsynaptic current raises the postsynaptic membrane 
potential which subsequently has to be brought back to the resting 
state by restoring ionic concentrations. Given a sparse distribu- 
tion of synaptic strength, most synapses induce an excitatory 



postsynaptic potential (EPSP) with low amplitude and low energy 
consumption, while few synapses induce an EPSP with a high 
amplitude and high energy consumption. The energy for synap- 
tic transmission will be mainly spent in the few strong synapses, 
but these have a high probability of inducing a postsynaptic 
action potential. Thus, the objective of a sparse synaptic strength 
leads to an energy-efficient synaptic signaling which may be an 
evolutionary relevant factor. 

A second relevant view on the benefits of a sparse synaptic 
strength is the need for tuning and input specificity. For a neuron 
to transmit relevant information, it should be tuned to specific 
inputs and avoid an unspecific Gaussian distributed input. But, 
according to the central limit theorem, a sum of n independent 
random variables always approaches a Gaussian regardless of their 
distribution as n becomes large. Interestingly, a lognormal weight 
distribution slows down the convergence toward the Gaussian due 
to its large skewness (see Note: Effect of sparseness on the central 
limit theorem in Appendix). 

There are many different possible mechanisms that could 
achieve our computational goal of sparseness. Our model is a 
result of the optimization via a stochastic gradient ascent, the 
specific choice of the sparseness measure, and the use of the mem- 
brane potential as a proxy for the synaptic weight distribution. 
The stochastic gradient ascent is necessary to keep the prob- 
lem computationally simple. For the sparseness measure we have 
chosen skewness as a higher-order statistical moment. Similarly, 
one could apply kurtosis or any higher standardized cumulant 
which would result in largely similar learning rule. An interest- 
ing difference between the cumulants is the dependence on the 
mean-free potential u. A learning rule based on skewness (or any 
odd cumulant) will be negative if u is negative, while for kurto- 
sis (or any even cumulant) it does not depend on the sign of u. 
Thus, in the context of metaplasticity, a learning rule based on 
kurtosis does not lead to stronger depression after a 100 Hz prim- 
ing. These higher-order moments are usually considered unstable 
since they can be very sensitive to outliers. We believe, however, 
that this is not a problem in our approach since the neuron 
is a bounded dynamical system. The voltage 'samples' driving 
the synaptic changes are continuously distributed and have no 
outliers. 

BIOLOGICAL REALIZATION AND FUTURE WORK 

We have discussed the computational and algorithmic level of our 
approach. But our learning rule can also be related to mecha- 
nistic theories of synaptic plasticity. Its non-linear dependence 
on the membrane potential fits with the Artola-Brocher- Singer 
(ABS) theory (Artola and Singer, 1993) and is quite comparable to 
the common U-shaped Q, function characterizing the dependence 
of synaptic changes on the intracellular calcium concentration 
[Ca 2+ ]; (Shouval et al., 2002). There, the voltage dependence of 
synaptic plasticity has consistently been connected to the surge 
of [Ca 2+ ],-. In this so-called calcium control hypothesis (Lisman, 
1989) it is suggested that different levels of intracellular calcium 
have opposing influence on the Ca 2+ /calmodulin-dependent pro- 
tein kinase II (CaMKII). This affects synaptic strength, since 
CaMKII regulates the phosphorylation level of AMPA receptors 
which determines channel conductance. 
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Relating the membrane potential in our learning rule with 
the calcium concentration also fits with the question how the 
synapse should estimate u. It was hypothesized that the calcium- 
binding protein calmodulin, due to its kinetics, could serve as 
differentiator of the intracellular calcium concentration (Rao and 
Sejnowski, 2001). However, an implementation of our learning 
rule might not necessarily employ a continuous time derivative 
at every moment in time. A simple estimate of the correlations 
between the synaptic current and the time derivative of the volt- 
age (or calcium concentration) could be sufficient. In this respect, 
it was shown that the amplitude of synaptic change in an STDP 
protocol is strongly correlated with the initial rate of increase 
of the intracellular calcium concentration over the stimulation 
period (Aihara et al., 2007). 

The strong dependence of our learning rule on the membrane 
potential is due to the choice of u as a proxy to estimate the synap- 
tic weight distribution. It was chosen since its dependence on the 
presynaptic input and the synaptic efficacies are easy to formu- 
late. Given the important role of calcium in mechanistic models, 
an important extension to our approach would be to take the 
intracellular calcium concentration as a proxy. Formulating the 
objective in this quantity would allow for a very direct connec- 
tion to the established mechanistic theory of the calcium control 
hypothesis. The derivation of a learning rule in this case would 
involve the dependencies of the calcium concentration on presy- 
naptic input, membrane potential, and release from intracellular 
stores. 

With the introduction of a calcium proxy, we expect new 
effects to be introduced in the results since the non-homogenity 
of the cytoplasmic calcium could affect the learning rule. For 
example, considering calcium nanodomains we would expect that 
the calcium signal would provide information about a much 
smaller neighborhood of synapses instead. The skewness would 
then potentially be optimizied for different regions individually 
instead of for the whole dendrite. However, the main term of the 
learning rule would depend on the change of the calcium con- 
centration (currently change of the membrane potential), so the 
effects would not depend so much on the calcium distribution 
itself but rather on the spatial transmission of changes in this 
distribution. 

With this extension, also the dependence on NMDA and 
AMPA receptor strength could be formulated more realistically. 
So far we assumed that the weight change of each receptor type 
only depends on its own synaptic current. But synaptic plasticity 
is calcium-dependent and thus mainly mediated by NMDA recep- 
tors. Normal AMPA receptors only indirectly affect the calcium 
concentration due to voltage-dependent calcium channels and 
the voltage-dependent magnesium block of NMDA receptors. But 
also the role of calcium-permeable AMPA receptors is just emerg- 
ing (Man, 2011). All of these dependencies can be incorporated 
into our approach by extending it with a calcium concentration 
proxy. 

A second important step toward a biologically realistic model 
is to take into account the actual mechanisms which contribute 
to the weight change. Our approach is mainly focused on the 
induction of plasticity and for its expression we have used 
the simplifying assumption that the number and single-channel 



conductances of the NMDA- and AMPA- type glutamate receptors 
are independently adapted as continuous variables. Incorporating 
realisitic expression, saturation, and maintenance of plasticity 
into our approach could be done with an explicit model of the dis- 
crete conductance/phosphorylation states, their transitions and 
a model of receptor insertion and removal. These mechansims 
could potentially account for the discrepancies in our simulations 
of the STDP pairing protocol. There, the plasticity for short delays 
grew too strong (requiring saturation) and the number of applied 
pairings is usually not sufficient to produce long-term plasticity 
in experiments (requiring some kind of threshold for expression 
or maintenance). 

Also spine neck geometry can influence the results of our 
learning rule. A strengthened synapse would not only have a large 
EPSP amplitude at the soma, but also locally in the spine head. 
This leads to a stronger potentiation since the synaptic current 
correlates more strongly with the local EPSP. Such a positive feed- 
back could be avoided by regulating the spine neck resistance to 
achieve a "standardized" EPSP in the spine head (Gulledge et al., 
2012). 

Finally, it has been shown recently that the metaplasticity 
threshold is actually independent of postsynaptic action poten- 
tials but rather depends on calcium (Hulme et al., 2012). Since 
our model in the current formulation intrinsically accounts for 
BCM-like metaplasticity as a function of the mean membrane 
potential, we expect that this influence of calcium be accounted 
for when using the calcium proxy extension. 

CONCLUSIONS 

In the light of our findings, we propose that synaptic plastic- 
ity is arising from the computational goal of sparse signaling. 
David Marr proposed to distinguish three levels of theoreti- 
cal analysis: computational goal, algorithm, and implementa- 
tion (Marr, 2010). Our proposal of a sparse synaptic strength 
(computational goal) leads to a differential-Hebbian learning 
rule (algorithm) which can be mapped onto the biological 
mechanisms (implementation), thereby, covering all three lev- 
els of analysis. The principle of sparseness would then be even 
more fundamental to neuronal computation than previously 
assumed. Given the interpretation in terms of energy efficiency 
it can be considered a fundamental factor that has shaped 
the mechanisms of neuronal plasticity during nervous system 
evolution. 

METHODS 

We used full morphological simulations in NEURON (Carnevale 
and Hines, 2006). All simulations were done with a layer 5 pyra- 
midal neuron (Mainen et al., 1995) provided on ModelDB (Hines 
et al., 2004) (accession number 8210). 

The plasticity rule depends on several parameters falling into 
two categories: The phenomenological model parameters, like the 
learning rate, were fixed for all experiments. The values are listed 
in Table 1. 

The physiological parameters comprise the morphology of the 
dendritic tree, the synaptic location on this tree, the spine geom- 
etry, the initial strength of the synapse, and the contributions of 
the different receptors types (AMPA and NMDA) as well as their 
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Table 1 | The phenomenological model parameters. 



Description 


Symbol 


Value 


Learning rate 


1 


1.5s 


Regularization for u 


y 


10 mV 


Noise level 


a 2 


0.036 mV 2 ms- 1 


Time constant for ii 




30 s 



Table 2 | The parameters for the conductance time course g(t) of 
AMPA and NMDA. 



Parameter AMPA NMDA 

(Spruston et al., 1995) (Kinney et al., 1994) 





0.55 ms 


4.05 ms 




2.0 ms 


27.6 ms 


trf. 


8.0 ms 


147.4 ms 


X 


0.8 


0.5 


A/ 


2.00 


1.36 




g(t) = N(\ exp[-f/x d ,] + (1 - X) exp[-f/x d2 ] 


- exp[-f/tr]) 



gating kinetics. Those parameters are also relevant in the exper- 
imental preparations and can vary between different cell types, 
brain area, etc. 

The dependence of the resulting plasticity on the exact mor- 
phology of the tree is beyond the scope of this work. And although 
the spine geometry is found to depend on the dendritic location 
(Jones and Powell, 1969; Berard et al., 1981) and influence plas- 
ticity (Yuste and Bonhoeffer, 2001; Gulledge et al, 2012), it was 
also taken to be fixed. The spine consisted of a neck (1 |xm long, 
0.1 (Jim thick) and the head (0.6 |xm long, 0.3 (Jim thick) as used 
by Koch and Poggio (1983). 

The time course of the synaptic conductance of a single 
AMPA/NMDA receptor was modeled by the sum of three expo- 
nentials g(t) with a normalized maximum: one exponential for 
the rising phase (time constant x r ), two for the decaying phase 
(time constants T^T,^), and their relative strength X. The 
parameter values are listed in Table 2. 

g^Ht) = SAMPA(f) (26) 

& NMDA (f, u) = ^NMDA(f) (l + ®^pl exp [-au]\ (27) 

The additional term accounts for the voltage-dependent magne- 
sium block of the NMDA receptor (Gabbiani et al., 1994). The 
postsynaptic action potentials were elicited by current injection 
of 2 mA for 3 ms at the soma. 

WEIGHT AND WEIGHT CHANGE 

The total synaptic conductance of a synapse was a weighted sum 
of both receptor types: 

gtotal(f) = W AM PA& AMPA (0 + WNMDA£ 0 NMDA ( f ) (28) 



Both weights wr independently followed the differential 
Equation (23) and were changed continuously. The strength of 
a synapse was measured as the EPSP peak amplitude at the soma. 
The increase/decrease of synaptic strength was calculated as the 
relative change of this peak amplitude. 

The initial value of the NMDA weight was taken to be inde- 
pendent of the distance to soma with wnmda = 500 pS. The 
influence of the AMPA current on the plasticity is usually small, 
since it is mainly calcium dependent. But due to the currently 
simpler version of our model (membrane potential is used as a 
proxy quantity instead of calcium concentration), the effect of the 
AMPA current was too strong and wampa was adjusted to fit the 
experimental data. Simulated synapses with a distance to soma 
smaller than 200 (Jim did not contain any AMPA receptors. For all 
other synapses, wampa was increased with the distance to soma 
(Andrasfalvy and Magee, 2001) and the resulting total synaptic 
strength was on the order of 0.1 — 0.2 mV (Magee and Cook, 
2000). The actual weights are mentioned in the results section. 
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APPENDIX 

METHODS: DERIVATION OF THE LEARNING RULE 

Probability distribution 

Sparseness is a function of the moments of a probability distri- 
bution. To calculate these moments (or expectation values) over 
a time interval [to, to + T] of duration T, we need the probabil- 
ity Pt( u = u o) that the membrane potential u takes a particular 
value wo- This is just the time integral over the conditional 
probability p(u\t) times the "prior" probability p(t) (which is 
uniform): 



u)= f 

J to 



to + T 



dtp(u\t)p(t) 

to + T 

dt p(u\t) 
dtp(u\t + t 0 ) 



(Al) 



In the deterministic case, we can predict u(t) given initial condi- 
tions. Therefore, p{u\t + to) = 8 (u — u(t + to)). 

In a real neuron, we can not predict u(t) for all times, 
but only for a very short time period. Therefore, we use a 
stochastic gradient approach. In a stochastic scheme with dis- 
crete time one substitutes the required expectation value by a 
sample. For the continuous time dynamical system of a neu- 
ron we take a sample by calculating the probability distribu- 
tion in a future infinitesimal time interval [to, to + 8]. Thereby, 
the membrane potential can be approximated with first-order 
dynamics. Assuming an additional iid Gaussian noise (which 
accounts for synaptic input and the noise of ion channels) leads 
us to a Wiener process including drift: u(t + to) ~ u,(f) + oX t 
with u,(f) = «(fo) + tu(to). Thus, the conditional probability 
will be 

p s (u|f + f 0 ) =N(u; [i(t),o 2 t). (A2) 
In the noiseless limit of cr — >■ 0 we recover the delta distribution. 

Sparseness measures 

We can consider different measures for sparseness. We investi- 
gate skewness S and kurtosis excess K as sparseness measure to 
be maximized: 



|X 3 



(A3) 



jdu(u-u)"^j dtp(u\t+t 0 ) 
i j dt j du (u — u)"p(u\t + t 0 ) 

dt j duM (#; |x(t) -u,o 2 t)u n 



(A5) 



where M„ is just the n-th raw moment of a Gaussian 



M 2 = |x 2 + a 2 t 



M 3 
M 4 



(x 3 + 3(xa 2 t 

|1 4 + 6|l 2 cr 2 t + 3a 4 t 2 



(A6) 
(A7) 
(A8) 



with |x = u,(t) — u. The integral can now be calculated analyti- 
cally and we are able to compute the derivatives as: 



3S 
du 



\o 2 ) |w 3 | +0(8) t«« 



2° 



M 6 + O (8) 



dK (|«h - 0 2 ) u 5 + O (8) 
Jk a u 8 + O (8) 



2 ■ - 2 
^MM — <T 



(A9) 



(A10) 



Mean membrane potential 

For u — > 0 one cannot neglect the higher-order terms in 8. Instead 
of using the full gradient expression for our learning rule, we 
introduce a simple regularization parameter, y is a constant 
phenomenological parameter which represents the higher-order 
terms and thereby prevents the expression from diverging: 



as 

du 

dK 
du 



2 ) I" 3 



ii 6 + y 6 



[iuu ■ 



2 ) u 5 



u + y 



(All) 



(A12) 



Sparseness is measured with respect to the mean of the probabil- 
ity distribution. In our case, it depends on the average membrane 
potential u. Due to the use of the stochastic gradient approach 
the maximization of the sparseness is achieved as a long-term 
average. Thus, the mean potential u should be basically a low- 
pass filtered version of the voltage dynamics. It will be vary- 
ing much more slowly than the instantaneous potential and its 
derivative. We compute it as an exponentially weighted average: 

du It — u 



K 



1X4 

rr4 



(A4) 



To compute these measures (and their derivatives) we calculate 
the expectation values of the mean-free membrane potential ii = 
u — u with our stochastic approximation: 



Weight dependence 

We use standard neuronal dynamics 



Cil = I]#ion ( £ ion - u ) + ^Mt)- 
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The synaptic current of a given receptor type _R is given by 
Mt)=N R gRg*(t) (B^n-tf) 



with the normalized current of a single receptor I® (f)> its maxi- 
mal conductance gn, and the total number of receptors Nr. 

We optimized both N and g simultaneously with individual 
learning rates: 



dw du dS 
dN dw du 



dw du dS 
= ^^wVu=^ NQ 



(A13) 
(A14) 



Here, ST2 = |^ || does not depend on N and g. Thus, we have two 
coupled differential equations. One can decouple them as 



(A15) 



by introducing X\,2 = ± ^7= and X. lj2 = iyngfjjv. 

Observing that 3(J ^' X2> = 0 it follows that N and g are 
constrained on a hyperbola: 



learning rule. For example keeping g fixed (r^ = 0) and only 
updating N will lead to 

w = W 2fi - 

For finite values of r 2 the learning rule will be multiplicative for 
large w and gradually become additive for smaller w. 

NOTE: EFFECT OF SPARSENESS ON THE CENTRAL LIMIT THEOREM 

The central limit theorem states that the distribution of the sam- 
ple mean of n i.i.d. random variables converges to a normal 
distribution as n goes to infinity: 



»(G|>H 



7V(0, a 2 ) 



with E(X) = \i and E [(X - u,) 2 ] = Var (X) = a 2 . 

The Barry-Esseen theorem attempts to quantify the rate of 
convergence in the central limit theorem. It provides an upper 
bound on the absolute difference between <t> (the cdf of the stan- 
dard normal distribution) and F„ (the cdf of the normalized 
sample mean of the i.i.d random variables). It states that for all 
x and n there exists a positive constant C such that 



\F n (x) - <t>(x)\ < 



Cp 



g^_N^ 

% TIN 



const. 



With this constraint we can decouple the initial differential 
equations as 



N = + T) N r 2 



g = \ttJg 2 - x\ g r 2 



(A16) 
(A17) 



with X = X\ = y/r\gf\N- The overall synaptic strength w evolves 
as 



w = gN + Ng = \ 



(^Jw 2 + r\ N g 2 r 2 + tJw 2 — T) tf iV 2 r 2 ^ £2 . 



(A18) 

The simulation results in this paper were obtained with r 2 = 
0. For this case the hyperbola degenerates to a linear relation- 
ship between N and g and the learning rule will be purely 
multiplicative 

w = XkwQ.. 

In the other extreme cases of r 2 — >- ±00 we only change one 
variable and keep the other fixed. This leads to a purelyadditive 



with the third absolute moment £ [|X — u,| 3 ] = p < 00. 

Let us interpret the inverse of the upper bound as a conver- 
gence rate 

k:= 



Cp 



Given a fixed number of variables, the rate can be minimized 

by maximizing the ratio or a lower bound of it. Since p is 
a 

the third absolute moment, a trivial lower bound for this ratio is 
always the skewness S which is the third central moment divided 



by ct 3 , i.e., S < 



But interestingly, for the log-normal dis- 



tribution [which synaptic strength is observed to follow (Song 

et al., 2005; Loewenstein et al., 2011)] one also finds that — < 

a D 

S(l + a) where a oc — for a — > 00. That means, S < < 

S(l + a) and the lower bound will become tight for sufficiently 
large a. Specifically, for a log-normal fit to experimental data 
(Song et al., 2005) (X ~ InTV (-0.702, 0.9355 2 ) ) the bound is 
already very close with a < 0.05. Thus, maximizing the skew- 
ness of synaptic strength will minimize the convergence rate k. 
The total synaptic current will only slowly converge to a Gaussian 
distribution and remain sparse even for a large number of 
inputs. 
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